The Usefulness of Logical Structure in Flexible Document Categorization

نویسندگان

  • Jebari Chaker
  • Habib Ounelli
چکیده

This paper presents a new approach for automatic document categorization. Exploiting the logical structure of the document, our approach assigns a HTML document to one or more categories (thesis, paper, call for papers, email, ...). Using a set of training documents, our approach generates a set of rules used to categorize new documents. The approach flexibility is carried out with rule weight association representing your importance in the discrimination between possible categories. This weight is dynamically modified at each new document categorization. The experimentation of the proposed approach provides satisfactory results. Keywords— categorization rule, document categorization, flexible categorization, logical structure.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

مبانی منطقی طراحی سیستم خط‌مشی‌گذاری دولتی برای تحقق عدالت حق‌مدار (براساس نهج‌البلاغه)

This article demonstrates a part of findings of a research that has been designed with the intention of determining the characteristics of the desired public policy making system for achieving social justice. To begin with, James P. Sterba's categorization of alternative political perspectives to justice is reviewed and then "truth – oriented" justice is studied. To reach a precise and scholar...

متن کامل

Text Type Structure And Logical Document Structure

Most research on automated categorization of documents has concentrated on the assignment of one or many categories to a whole text. However, new applications, e.g. in the area of the Semantic Web, require a richer and more fine-grained annotation of documents, such as detailed thematic information about the parts of a document. Hence we investigate the automatic categorization of text segments...

متن کامل

A Flexible Skew-Generalized Normal Distribution

 In this paper, we consider a flexible skew-generalized normal distribution. This distribution is denoted by $FSGN(/lambda _1, /lambda _2 /theta)$. It contains the normal, skew-normal (Azzalini, 1985), skew generalized normal (Arellano-Valle et al., 2004) and skew flexible-normal (Gomez et al., 2011) distributions as special cases. Some important properties of this distribution are establi...

متن کامل

Oil and Iran Regions Rural Economic Structure Alteration

The oil has gradually obtained a predominant place in national economy since 1950 and nowadays, is the main important resource securing country financial needs. Two questions are the base of this research regarding contradiction of oil rent and traditional economic sectors including agriculture and livestock rearing which always have been intensified. These two questions are as follows: what ar...

متن کامل

Apply Uncertainty in Document-Oriented Database (MongoDB) Using F-XML

As moving to big data world where data is increasing in unstructured way with high velocity, there is a need of data-store to store this bundle amount of data. Traditionally, relational databases are used which are now not compatible to handle this large amount of data, so it is needed to move on to non-relational data-stores. In the current study, we have proposed an extension of the Mongo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004